Combinatorial Pattern Matching

chapter

Sharper Upper and Lower Bounds for an Approximation Scheme for Consensus-Pattern

Broňa Brejová, Daniel G. Brown, Ian M. Harrower, Alejandro López-Ortiz, more

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 1-10

We present sharper upper and lower bounds for a known polynomial-time approximation scheme due to Li, Ma and Wang [7] for the Consensus-Pattern problem. This NP-hard problem is an abstraction of motif finding, a common bioinformatics discovery task. The PTAS due to Li et al. is simple, and a preliminary implementation [8] gave reasonable results in practice. However, the previously known bounds on...

chapter

On the Longest Common Rigid Subsequence Problem

Bin Ma, Kaizhong Zhang

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 11-20

The longest common subsequence problem (LCS) and the closest substring problem (CSP) are two models for the finding of common patterns in strings. The two problem have been studied extensively. The former was previously proved to be not polynomial-time approximable within ratio n ^δ for a constant δ. The latter was previously proved to be NP-hard and have a PTAS....

chapter

Text Indexing with Errors

Moritz G. Maaß, Johannes Nowak

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 21-32

In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of...

chapter

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space

Dong Kyue Kim, Heejin Park

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 33-44

The compressed suffix array and the compressed suffix tree for a given string S are full-text index data structures occupying O(nlog|Σ|) bits where n is the length of S and Σ is the alphabet from which symbols of S are drawn. When they were first introduced, they were constructed from suffix arrays and suffix trees, which implies they were not constructed in optimal O(nlog|Σ|)-bit working space. Recently,...

chapter

Succinct Suffix Arrays Based on Run-Length Encoding

Veli Mäkinen, Gonzalo Navarro

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 45-56

A succinct full-text self-index is a data structure built on a text T=t ₁ t ₂... t _n, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P=p ₁ p ₂... p _m in T, and is able to reproduce any text...

chapter

Linear-Time Construction of Compressed Suffix Arrays Using o(n log n)-Bit Working Space for Large Alphabets

Joong Chae Na

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 57-67

The suffix array is a fundamental index data structure in string algorithms and bioinformatics, and the compressed suffix array (CSA) and theFM-index are its compressed versions. Many algorithms for constructing these index data structures have been developed. Recently, Hon et al. [11] proposed a construction algorithm using O(n ·loglog|Σ|) time and O(nlog|Σ|)-bit working space, which is the fastest...

chapter

Faster Algorithms for δ,γ-Matching and Related Problems

Peter Clifford, Raphaël Clifford, Costas Iliopoulos

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 68-78

We present new faster algorithms for the problems of δ and (δ, γ)-matching on numeric strings. In both cases the running time of the proposed algorithms is shown to be O(δn log m), where m is the pattern length, n is the text length and δ a given integer. Our approach makes use of Fourier transform methods and the running times are independent of the alphabet size. $O(n\sqrt{m\log{m}})$ algorithms...

chapter

A Fast Algorithm for Approximate String Matching on Gene Sequences

Zheng Liu, Xin Chen, James Borneman, Tao Jiang

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 79-90

Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch problem...

chapter

Approximate Matching in the L ₁ Metric

Amihood Amir, Ohad Lipsky, Ely Porat, Julia Umanski

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 91-103

Approximate matching is one of the fundamental problems in pattern matching, and a ubiquitous problem in real applications. The Hamming distance is a simple and well studied example of approximate matching, motivated by typing, or noisy channels. Biological and image processing applications assign a different value to mismatches of different symbols. We consider the problem of approximate...

chapter

An Efficient Algorithm for Generating Super Condensed Neighborhoods

Luís M. S. Russo, Arlindo L. Oliveira

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 104-115

Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Here, we point out that condensed neighborhoods are not a minimal representation of a pattern neighborhood. We show that we can restrict our attention to super condensed neighborhoods which are minimal. We then present an algorithm for generating Super Condensed Neighborhoods...

chapter

The Median Problem for the Reversal Distance in Circular Bacterial Genomes

Enno Ohlebusch, Mohamed Ibrahim Abouelhoda, Kathrin Hockel, Jan Stallkamp

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 116-127

In the median problem, we are given a distance or dissimilarity measure d, three genomes G ₁,G ₂, and G ₃, and we want to find a genome G (a median) such that the sum ∑ $_{i=1}^{\rm 3}$ d(G,G _i) is minimized. The median problem is a special case of the multiple genome rearrangement problem, where one wants to find a phylogenetic tree...

chapter

Using PQ Trees for Comparative Genomics

Gad M. Landau, Laxmi Parida, Oren Weimann

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 128-143

Permutations on strings representing gene clusters on genomes have been studied earlier in [3, 12, 14, 17, 18] and the idea of a maximal permutation pattern was introduced in [12]. In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees [6]: this describes the inner structure and the relations between clusters succinctly, aids in filtering...

chapter

Hardness of Optimal Spaced Seed Design

François Nicolas, Eric Rivals

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 144-155

Speeding up approximate pattern matching is a line of research in stringology since the 80’s. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are excluded (filtered out) in a first step, and remaining regions are compared to the pattern by dynamic programming in a second step. Among the necessary conditions used to test similarity...

chapter

Weighted Directed Word Graph

Meng Zhang, Liang Hu, Qiang Li, Jiubin Ju

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 156-167

Weighted Directed Word Graph is a new text-indexing structure which is based on a new compaction method on DAWG. WDWGs are basically cyclic, means that they may accept infinite strings. But by assigning weights to the edges, the acceptable strings are limited only to the substrings of input strings. The size of WDWGs is smaller than that of DAWGs both in theory and practice. A linear-time on-line...

chapter

Construction of Aho Corasick Automaton in Linear Time for Integer Alphabets

Shiri Dori, Gad M. Landau

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 168-177

We present a new simple algorithm that constructs an Aho Corasick automaton for a set of patterns, P, of total length n, in O(n) time and space for integer alphabets. Processing a text of size m over an alphabet Σ with the automaton costs $O(m \log \left|\Sigma\right| + k)$ , where there are k occurrences of patterns in the text.

chapter

An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression

Sabrina Mantaci, Antonio Restivo, G. Rosone, Marinella Sciortino

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 178-189

We introduce a generalization of the Burrows-Wheeler Transform (BWT) that can be applied to a multiset of words. The extended transformation, denoted by E, is reversible, but, differently from BWT, it is also surjective. The E transformation allows to give a definition of distance between two sequences, that we apply here to the problem of the whole mitochondrial genome phylogeny. Moreover we give...

chapter

DNA Compression Challenge Revisited: A Dynamic Programming Approach

Behshad Behzadi, Fabrice Fessant

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 190-200

Standard compression algorithms are not able to compress DNA sequences. Recently, new algorithms have been introduced specifically for this purpose, often using detection of long approximate repeats. In this paper, we present another algorithm, DNAPack, based on dynamic programming. In comparison with former existing programs, it compresses DNA slightly better, while the cost of dynamic programming...

chapter

On the Complexity of Sparse Exon Assembly

Carmel Kent, Gad M. Landau, Michal Ziv-Ukelson

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 201-218

Gene structure prediction is one of the most important problems in computational molecular biology. A combinatorial approach to the problem, denoted Gene Prediction via Spliced Alignment, was introduced by Gelfand, Mironov and Pevzner [5]. The method works by finding a set of blocks in a source genomic sequence S whose concatenation (splicing) fits a target gene T belonging to a homologous species...

chapter

An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

Paul Horton, Wataru Fujibuchi

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 219-228

Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation...

chapter

Incremental Inference of Relational Motifs with a Degenerate Alphabet

Nadia Pisanti, Henry Soldano, Mathilde Carpentier

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 229-240

In this paper we define a new class of problems that generalizes that of finding repeated motifs. The novelty lies in the addition of constraints on the motifs in terms of relations that must hold between pairs of elements of the motifs. For this class of problems we give an algorithm that is a suitable extension of the KMR [3] paradigm and, in particular, of the KMRC [7] as it uses a degenerate alphabet...

INFONA - science communication portal

Combinatorial Pattern Matching
16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005. Proceedings

Sharper Upper and Lower Bounds for an Approximation Scheme for Consensus-Pattern

On the Longest Common Rigid Subsequence Problem

Text Indexing with Errors

A New Compressed Suffix Tree Supporting Fast Search and Its Construction Algorithm Using Optimal Working Space

Succinct Suffix Arrays Based on Run-Length Encoding

Linear-Time Construction of Compressed Suffix Arrays Using o(n log n)-Bit Working Space for Large Alphabets

Faster Algorithms for δ,γ-Matching and Related Problems

A Fast Algorithm for Approximate String Matching on Gene Sequences

Approximate Matching in the L ₁ Metric

An Efficient Algorithm for Generating Super Condensed Neighborhoods

The Median Problem for the Reversal Distance in Circular Bacterial Genomes

Using PQ Trees for Comparative Genomics

Hardness of Optimal Spaced Seed Design

Weighted Directed Word Graph

Construction of Aho Corasick Automaton in Linear Time for Integer Alphabets

An Extension of the Burrows Wheeler Transform and Applications to Sequence Comparison and Data Compression

DNA Compression Challenge Revisited: A Dynamic Programming Approach

On the Complexity of Sparse Exon Assembly

An Upper Bound on the Hardness of Exact Matrix Based Motif Discovery

Incremental Inference of Relational Motifs with a Degenerate Alphabet

Filter options

Publication date

Keywords

INFONA - science communication portal

Combinatorial Pattern Matching 16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005. Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Combinatorial Pattern Matching
16th Annual Symposium, CPM 2005, Jeju Island, Korea, June 19-22, 2005. Proceedings